
APP_ID=10064339 



Page 229 of 280 



1. Q O S. «*-2i25 *3 - Ol 7'D E O 2: 



Stcpl 
Step2 
Step3: 
Step4: 



How Rings assign address space 

eli^i rearming address to self (to some power of 2) 
asa^ithe re silt to self address 

next_addr = self_addr + self_adck_space, //number of re^ster usedlocally 
send dawnnext addr 



Example: 

Dma reeds 16 adds 
Uaart reeds 4 
lurer needs 256 



Emrreiate rress. 
Addr =8 



self =32 




Addr = 512 
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4 



clkl 



clk2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 



-ft 



- d 



compound A 



.1* 



"7<> 



compound B 



clock runs with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring-interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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7*> 



Vglka 



data__a 



compound 



7 



clock opposes data 

this arrangement has the advantage 
of auto ensuring the no race 
•condition (at least in this simple 



cTlcB < CH 
compound B 



data_b ca se) exists 



It 



clkb - 
clka 



Qa - 
dala_a 



/ 



L_r~i_n 



v 



data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



7 



- d 



clka 



Qal 



ii_ d 



X 



-7° 



90 
/ 



7? 



OK \ 




compound A 



compound B 
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^'5 



clock 



clock 




clka 



clkb 



data_a 




clock 



ncertainty range 



data_a leaving the bridge goes to member "b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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data 




clock 




ncertainty range 



Here, data_b leaves member "b" to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. Tt will need a latch in M b*\ 



big module 



locaLclock *~ ,i0 

local_data_out 



data from 
previous memoer 

elk 




clock 



ring interface 



local clock lags behind 
ring_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b", ring works 
fine. However if most of the traffic is from "c" to "b'\ 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that , "a" 
transmits mostly to "d" and "b" mostly to "c" In this case 
communication between "b" and 4, c" suffers degradation 
in case that peak traffic coincide. 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to Dl. By same magic, 
message from VI to D2 get bypassed also, 
message fromVl to Dl is treated directly. 




Enumeration is started by "Anchor" 
which assigns address- 1 to itself, results 
of enumeration are labels 1 to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near** end, that got 
enumeration label "3", and the tk far" end 
marked 6. 
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Bridge takes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 




II far 



In land bridge ring, the situation is trickier. If V2 
send message to address=5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 
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s 
a 



V*9> 



typc^ 



20 



addr^ 7 



$4jdata^ftf 



ok 



scan test 



reset 



clk(?) 



i 

e 



elk 



type 



ok 



idle / msgA 



X ms £ B 



\idk 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
memberA may send new message. memberA uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok omsg ook 
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member A 

inuglok omsg ook 



member B 

imsgiok omsg ook 




2.7© 



The incoming messages are examined first 19 

to see if it is supervisor or work/program. j 

Work/program messages have address field. | 

We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 

register. This register gets its value during ^ . 

enumeration protocol. The lower part of this C comparator} 
register is always masked,. Hopefully | 
synthesis will delete the unused bits 1 

implementation. ours/through part of the address 




incoming 
address 

address 
split mask 



dont cane part 
of self-address 



\ 



that enters the member 
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Member 



RifJL* Rif_* Rifo_* moduIeJd 



RIF 



Address Space « 7 



Activation register 



300 
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Member 




g. 3 2 
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Member 



Rif_I_* Rif_* Rif_0_* module.id 



Address Space = 7 



Activation register 



Fig?. 3"^ 



300 
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the second land bridge solves mosl traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
peri miter. 



3S + 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3^ 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 

3iT7 



Internal Memory 35^2. 
Fast, Unified, Multi-port 



^3 ^ -h*± 

Network Processor 



Peripheral Expansion 

Enet, ATM, Uart, USB. Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 

3^ 



i5o 
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Internal Memory 



FcUh Ifnil 

FTU 



r 



$1* 



Load'Storr 

LSU 



Program St'ifueiiccx 

PSU 



Vobla Core 



to 



Register Fik 



Pre fcMil&U inn p ^_ 

"*« SSli RFU 



Arilbnaclic 

DALU 



3lO 



Core Debug 



DMA Agcnl 



VobU Compound* 



gel 



Agcnl Biu 



SI 
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Othef data 




/="/ j • 2-f 



32 registers of 
32 bn* per task., 

A siK otindkatum* 
per ta&k. which 
control ta&k execution 
schciliilift& 

An interface to 
adjaecfti resources ™" 

hast memory accessed 
by load/store 
iftstmcrtions 



General 
registers 



Special 
registers 



T>owbc!E1 



Ageat 
imerftce 



iXNrtiguration 



hiumal 
memory 



External 
memory 



pH.'r task aixt 
global register* 



initiative by 
tbcPP 



Big area accessed 
via a DMA intcrfaetf 
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Rl refksrer: 



s - sticky btt 

eq - equal/zero 

tt - leas thttL'toflgsnve 

jCt - gtealCf EhrtVposiiivc 

c- carry 

nib - fttflecticui of the RAM iniiltL-rraJcr busy indicati«n. 
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Frame structure 
of an example 
tusk type 



Y3 



r27. 



common 
tu.sk data 


task 

fragment I 
data 






task fragment 
2 data . 


task 

fragment 3 
data 


data of all level2 functions 


level 1 fl 
data 


level 1 f> 
data 


level! ft 
data 







; sizcofievelO 

: frame part is 

\ different for 

• eachta.sk 

A size of level 2 
I frame part is 

* constant 

size of level I 
frame part is 
different per 
each task 
type 
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- Flip Flop 



^> - Logic 
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J intcTface_aikl i css{ 15:0 



type oul[7:0](sizettCRC ) 
^ddrBTout[23:0] 
data_out[63:0] 



(alwUoCRC) 
5 okidnvc 



9* 



memory data 



bataraddO]Uata[add 

1 f 
\ / 

1 * ' ' 

a data 


lbatafadd21 |dataIadd3^atafadcM 


1 bataradd5l|data[addf 


J)dataradd7 

/ 

r | bvter7l 







opcode joptiom [ 5:0] j RA |aID[S:0] 



agent 
(AID»mu1tiread< 



vobla entry 
In multlreader 

increment flrsi >.jioop lasl 

(op3) 




RB/imm8 



[source addrcss(23:Uj ||destinano»ddrcsi>[23:Q] 



countT7.0] 



source addr in sram address of destenation number of bytes 
— - io transfer 
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57 



Vobla 



agent interface 



.J 



T 



mam 
entry 



shadow 
entry 



8 




Output 




message 






encoder 


f 







type out[7:0] 

uddress_ out[23;0] 

data5[«t[63:0] 



reset 



elk 



ok2 drive 



agent command 



opcode 



options[ Q :OJ 



(AID=mes3age_aender) 
(opUon[6j -0) 



RA 



(option [6] -0) ~* j£ . — ^ 1 £J* ' 

fneasage^sendef3 ^ 0tO pti ons [5:Qj| raw_d«ta[3 1:0] | raw_addrcss[23:0] | rypcf7:0J \ ^ rfft ' 



AID[4:0} 




message data 



address_of_dcstcnation message type 



1 opcode 


optiom[<*:0] 


RA AID[4.0J 


RB 


— ^ 




J\^A+J 





agent command | 
(AtD=fnowag*_i 

(option[6] =1 ) — . m 

^ ^ ndftre J^opuonsi5:Qlj^ n»w_data[63:0) | raw_addrcssl23:0] 



10000000 



message data 



address_of_dcstcnation message type 



61 



doorbell 



n^set 



mask 



di i set mask 
d n*set mask 



dn*set mask 



Vobla 



igent interfi ce 
stall vobla 



token 
control 



t _id[5K)] 



request 
entries 

X2 



DMA 
context table 



input 

message 
decoder 



output 

message 
encoder 



lypejn[7:0] 
t.vrii7 tmcfface_ftddrcas(23 :0] 

wnie_intcrfaccjdaui[31 :») 
r7^5asc_ addrcss[7:0] 

type out[7:0] 
ad drgss__ out[23:0] 
ta^mt[63.0] 



*ok2drivc 
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entry 



t requesl 




agent command opcode 
(AlCNdma agent] 



iniKliry urgent dir autosend long 
address (OP2> (OTl)set ack address 



dram address 



sram address 



[tcount[7:0]^~| ^ 

number of bytes 
to transfer or 
address modifycr 



41* 



multireade * 



JasUnJranstaP- 



calculate^; 



ulti reaiflataf jl.-Q] 
I multi leader 



Vobla 



jo^jUn^gs 

stall vobla 
— 



register 




input 


file 




buffer 


* 


t 



tx_data mux 



CRC data 



random 
number 
gen era to 
(TRD) 



, on dmand 



5,10.32 
machine 



checksum 
machine 



bipl6 
machine 



elk 



reset 



agent command 
(AID^CRC agent) 



| opcode 


options[9:0] 


RA 


AID[4:0] 1 


j RB i 




CRC type data size generate operation wfltr^rite 
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Vobla 



igent interna cg 



control 
register 



counter 




pre scalci 



div 

by 2 



time stamp 
register 



elk 



reset 



61 



(AID=timer agent) 



1 opcode 


options [9:0] 


RA 


AID 


RB/immS 




Pi 



"A 



t 
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rif i wri te 
jnt i doo rbell cs 




ag«nt comma i 
(AIDadoorbeli 



nd op 



opcode 



optionsf9:0] 



~RA I AIDC4:0] 

1 M W ' — 



RB/imuiR 



it- 



set/clear clear clear 
global request mask 
(OP2) (OP I) (QPO) 



{ 0,0,0 AO,maskJnt_indcx [2:0] } write mask 
or 

{0,0,0,0, 1 ,reqJ>«Un dex[2:01 } write request 
or 

{ U0 t 0,0.0,couiit vatuc[2:0]J write counter 
or 

10,1,0,0,0,0,0,0} write TGMR 
or 

{1,1 ,0,0,0,(J,0.urgcnt_valuc } write urgent 



yo 
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Memory 



Trajan chip 



Memory 
port 



"A — 



CS 



cs 



Data 



Addr 



FlFO_RD 



, FIFO_WR 



WRFIFO 



RD FIFO 



Host # I (PP) 
Host #2 (DMA) 



Ring 
interface 



Message sender 



Write request 
generator 



rr 



Read n 
generator 



WR FIFO used 
words counter 



rr 



RD FIFO used 
words counter 



Host #3 (Ring extender) 



Fl y . &>6? 



< — ► 



Encryption 



< ► 



DSP 



PCI 



1*° 



Memory controller 



Trajan 
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Trajan memory 
interface 



Trajan FIFO_WR < FIFO_Wfe 
input 



DPR 



IE 



Host address 
generator 



RAM / FIFO 



Target address 
generator 



RAM / FIFO 



Synchronizer 



Synchronizer 



Interface to 
external 
device 
(device 
specific) 



Synchronizer 



Interrupt 



FPGA 
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- [ 



r ing in 



setup registers: 



doorbell, taskid and viscode 
urgent viscode 
header len and address 
threshold to urgent 



doorbells 




free 



ring c 



ovrn 



last 



rx status word 
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setup registers: 



mac 



doorbell, taskid and viscode 
urgent viscode 
send ahead address 
threshold to transmit 
threshold to urgent 



ri ng in 




doorbell 
free entry count 
inished frames coui 



ring c 



(SI 
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Control P|«Qe 
Signaling Protocole 



Data Plane 

PtttfpACket handling 
Forwarding Decision 
Classification 
QoS Handling | 
Queuing 
Scheduling 
Formatting 



2 





no 



jWrouh Emuhiten 



AAL-0(2) lAAL-1{2) IN AAJ--2 (2) 



3 



ATM L oth L»ytr UoduU (2) 



9 
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External 
SDRAM 



RxFtFO.RxDooiB&t 




fly 
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JUUA, AALZ AAL1. JUa.P I 



Protocol specific 

data path 
functional blocks 



Generic data path 

processing 
functional blocks 



Generic system 

service 
functional blocks 



DSrag support 



Vobla maintenance 
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FIG. 87 
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device 



XL 



/SSL 



fret 
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no. 86 
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PIG. 81 
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